82 research outputs found

    Assessment of query reweighing, by rocchio method in farsi information retrieval

    Get PDF
    Due to the lack of users knowledge of the collections used by search engines and in general retrieval systems, users can not express their information need appropriately in queries. In other words, they do not have enough experience to formulate their needs to find related documents. The idea of user’s query expansion aims to help users to improve and correct the queries. In fact, retrieval system, regarding the feedback it receives from user at the first stage, moves the query in set space to more related documents. Different approaches in information retrieval systems have been used; however, there has not been any assessment of efficacy of query expansion in Farsi information retrieval systems. In this paper, expansion basic model of Rocchio, assessed as the primary model to retrieve Farsi documents, has been presented. As a matter of fact, the purpose of this study is to determine the effect of a standard and basic model on query expansion to retrieve Farsi documents, so that the researchers can compare their achievements of query expansion with the findings of this paper which showed a straightforward and positive effect on Farsi document retrieval

    FuFaIR: a Fuzzy Farsi Information Retrieval System

    Get PDF
    Persian (Farsi) is one of the languages of Middle East. There are significant amount of Persian documents available in digital form and even more are created every day. Therefore, there is a necessity to implement Information Retrieval System with high precision for this language. This paper discusses the design, implementation and testing of a Fuzzy retrieval system for Persian called FuFaIR. This system also supports Fuzzy quantifiers in its query language. Tests have been conducted using a standard Persian test corpus called Hamshari. The performance results obtained from FuFaIR are positive and they indicate that the FuFaIR could notably outperform well known industry systems such as the vector space model

    Hamshahri: A standard Persian Text Collection

    Get PDF
    The Persian language is one of the dominant languages in the Middle East, so there are significant amount of Persian documents available on the Web. Due to the special and different nature of the Persian language compared to other languages like English, the design of information retrieval systems in Persian requires special considerations. However, there are relatively few studies on retrieval of Persian documents in the literature and one of the main reasons is lack of a standard test collection. In this paper we introduce a standard Persian text collection, named Hamshahri, which is built from a large number of newspaper articles according to TREC specifications. Furthermore, statistical information about documents, queries and their relevance judgment are presented in this paper. We believe that this collection is the largest Persian text collection, so far

    Effectiveness of Rich Document Representation in XML Retrieval

    Get PDF
    Information Retrieval (IR) systems are built with different goals in mind. Some IR systems target high precision that is to have more relevant documents on the first page of their results. Other systems may target high recall that is finding as many references as possible. In this paper we present a method of document representation called RDR to build XML retrieval engines with high specificity; that is finding more relevant documents that are mostly about the query topic. The Rich Document Representation (RDR) is a method of representing the content of a document with logical terms and statements. The conjecture is that since RDR is a better representation of the document content it will produce higher precision. On our implementation, we used the Vector Space model to compute the similarity between the XML elements and queries. Our experiments are conducted on INEX 2004 test collection. The results indicate that the use of richer features such as logical terms or statements for XML retrieval tends to produce more focused retrieval. Therefore it is a suitable document representation when users need only a few more specific references and are more interested in precision than recall

    Hamshahri: A standard Persian Text Collection

    Get PDF
    The Persian language is one of the dominant languages in the Middle East, so there are significant amount of Persian documents available on the Web. Due to the special and different nature of the Persian language compared to other languages like English, the design of information retrieval systems in Persian requires special considerations. However, there are relatively few studies on retrieval of Persian documents in the literature and one of the main reasons is lack of a standard test collection. In this paper we introduce a standard Persian text collection, named Hamshahri, which is built from a large number of newspaper articles according to TREC specifications. Furthermore, statistical information about documents, queries and their relevance judgment are presented in this paper. We believe that this collection is the largest Persian text collection, so far

    Understanding the Business Value Creation Process for Business Intelligence Tools in the UAE

    Get PDF
    Background: In today’s dynamically changing business environment, organizations are constantly striving to improve their organizational performance, and the use of business intelligence (BI) aids in this process. This is a key reason why BI has captured the attention of many academicians and practitioners. Most of the research on BI highlights its importance and the ways in which it facilitates decision-making; however, there is limited research on how the use of BI tools and applications benefits the organization by enabling it to convert the millions of dollars spent on BI investment into business value. This study validates the “business value of BI” framework by evaluating the business value creation process for BI tools in the UAE. Method: The data for this empirical study were collected via interviews with 15 middle managers from various industries in the UAE, with a focus on BI tools and the use of BI in their organizations. Results: The results show that a business value creation process for BI exists and that it is divided into three subprocesses: the BI conversion process, BI use process, and BI competitive process. Conclusions: The use of BI has become necessary for organizations to compete effectually. The results of this study identify the various BI tools that organizations in the UAE use. Furthermore, the results highlight the business value creation process of BI tools and how the effective use BI offers an information advantage that facilitates decision-making and eventually promotes financial and non-financial benefits for the organization. Available at: https://aisel.aisnet.org/pajais/vol11/iss3/4

    A novel sentiment analysis framework for monitoring the evolving public opinion in real-time: Case study on climate change

    Get PDF
    Smart city analytics involves tracking, interpreting, and evaluating the sentiments and emotions that are shared via online social media channels. Sentiment analysis of social media posts has become increasingly prominent in recent years as a means of gaining insights into how members of the public perceive current affairs. The ongoing research in this domain has leveraged multiple types of sentiment analysis. However, although the existing approaches enable researchers to acquire retrospective insights into public opinion, they do not enable a real-time evaluation. In addition, they are not readily scalable and necessitate the analysis of a significant amount of posts (in the millions) to facilitate a more in-depth evaluation. The study outlined in this paper was designed to address these shortcomings by presenting a framework that facilitates a real-time evaluation of the evolution of opinions shared by prominent public features and their respective followers; that is, high-impact posts. The developed solution encompasses a sophisticated Bi-directional LSTM classifier that was implemented and tested using a dataset consisting of 278,000 tweets related to the topic of climate change. The outcomes reveal that the proposed classifier achieved the following accuracies: 88.41% for discrimination; 89.66% for anger; 87.01% for inspiration; and 87.52% for joy - with negative emotions being more accurately classified than positive emotions. Similarly, the sentiment classification performance yielded accuracies of 89.32% for support and 89.80% for strong support, as well as 88.14% for opposition and 87.52% for strong opposition. In addition, the findings of the study indicated that geographic location, chosen topic, cultural sensitivities, and posting frequency all play a critical role in public reactions to posts and the ensuing perspectives they adopt. The solution stands out from existing retrospective analysis methods because it does not rely on the analysis of vast quantities of data records; rather, it can perform real-time, high-impact content analysis in a resource-efficient and sustainable manner. This framework can be used to generate insights into how public opinion is developing on a real-time basis. As such, it can have meaningful application within social media analysis efforts

    Using OWA fuzzy operator to merge retrieval system results

    Get PDF
    With rapid growth of information sources, it is essential to develop methods that retrieve most relevant information according to the user requirements. One way of improving the quality of retrieval is to use more than one retrieval engine and then merge the retrieved results and show a single ranked list to the user. There are studies that suggest combining the results of multiple search engines will improve ranking when these engine are treated as independent experts. In this study, we investigated performance of Persian retrieval by merging four different language modeling methods and two vector space models with Lnu.ltu and Lnc.btc weighting schemes. The experiments were conducted on a large Persian collection of news archives called Hamshari Collection. Different variations of the Ordered Weighted Average (OWA) fuzzy operators method, called a quantifier based OWA operator and a degree-of-importance based OWA operator method have been tested for merging the results. Our experimental results show that the OWA operators produce better precision and ranking in comparison with weaker retrieval methods. But in comparison with stronger retrieval models they only produce minimal improvements

    A Vector Based Method of Ontology Matching

    Get PDF

    Investigation of the Lambda Parameter for Language Modeling Based Persian Retrieval

    Get PDF
    Language modeling is one of the most powerful methods in information retrieval. Many language modeling based retrieval systems have been developed and tested on English collections. Hence, the evaluation of language modeling on collections of other languages is an interesting research issue. In this study, four different language modeling methods proposed by Hiemstra [1] have been evaluated on a large Persian collection of a news archive. Furthermore, we study two different approaches that are proposed for tuning the Lambda parameter in the method. Experimental results show that the performance of language models on Persian text improves after Lambda Tuning. More specifically Witten Bell method provides the best results
    • …
    corecore